Eigen-MLLR environment/speaker compensation for robust speech recognition
نویسندگان
چکیده
However, it is generally difficult if not impossible to prepare a complete set of a priori noisy environment knowledge. Especially, the noisy environments which are not seen in the training phase (aka unseen noisy environment) may become potential sources of serious performance degradation for those non-blind methods. In other words, it is crucial how to well organize and efficiently utilize the a priori noisy environment knowledge. To efficiently take the advantage of a priori noisy environment knowledge and, at the same time, alleviate the problem of unseen noisy environments and speakers, an Eigen-MLLR method originally proposed only for speaker adaptation [7-8] is adopted in this paper. In this paper an eigen-maximum likelihood linear regression (Eigen-MLLR) method is proposed to utilize a set of a priori noisy environment/speaker knowledge to online compensate the characteristics of unknown test environment/speaker. This idea is straightforward but is motivated from our recent findings that both the characteristics of different kinds of noisy environments and speakers could be simultaneously well organized in a PCA-constructed Eigen-MLLR subspace. Especially, the first three dimensions of the constructed EigenMLLR subspace are highly related to the SNR value, gender and type of noise. The proposed Eigen-MLLR was evaluated on Aurora 2 multi-condition training task. Experimental results showed that average word error rate (WER) of 6.14% was achieved. Moreover, Eigen-MLLR not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%), the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) approaches. This idea is straightforward (from the viewpoint of speaker adaptation) but is motivated from our recent finding [5] that both the characteristics of different kinds of noisy environments (represented by a set of MLLR super-matrices) could be simultaneously well organized in a PCA-constructed Eigen-MLLR subspace. Especially, the first three dimensions of the constructed Eigen-MLLR subspace are highly related to the SNR value, gender and type of noise. It is therefore possible to (1) analyze a set of environment/speaker characteristics collected from all seen noisy environments/speakers in the training phase to construct an Eigen-MLLR environment/speaker subspace, and to then (2) optimally estimate (in the sense of maximum likelihood) the characteristics of the unknown test noisy environment/speaker in the test phase to compensate the HMMs of the ASR engine.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل一個結合SVM 與Eigen-MLLR 新的多語者線上調適架構應用於泛在語音辨識系統 (A New On-Line Multi-Speaker Adaptation Architecture Combining SVM with Eigen-MLLR for Ubiquitous Speech Recognition System) [In Chinese]
متن کامل
Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation
This paper investigates the use of maximum likelihood linear regression (MLLR) for both speaker and environment adaptation. MLLR transforms the mean and variance parameters of a set of HMMs. In this paper a number of different types of linear transformations of the variances are examined including full, block diagonal, and diagonal transformation matrices. Experiments on large vocabulary speake...
متن کاملMean and variance adaptation within the MLLR framework
One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of speaker speciic data, and are often based on initial speaker independent (SI) recognition systems. Some of these speaker adaptation techniques may also...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008